20 research outputs found
Strongly Hierarchical Factorization Machines and ANOVA Kernel Regression
High-order parametric models that include terms for feature interactions are
applied to various data mining tasks, where ground truth depends on
interactions of features. However, with sparse data, the high- dimensional
parameters for feature interactions often face three issues: expensive
computation, difficulty in parameter estimation and lack of structure. Previous
work has proposed approaches which can partially re- solve the three issues. In
particular, models with factorized parameters (e.g. Factorization Machines) and
sparse learning algorithms (e.g. FTRL-Proximal) can tackle the first two issues
but fail to address the third. Regarding to unstructured parameters,
constraints or complicated regularization terms are applied such that
hierarchical structures can be imposed. However, these methods make the
optimization problem more challenging. In this work, we propose Strongly
Hierarchical Factorization Machines and ANOVA kernel regression where all the
three issues can be addressed without making the optimization problem more
difficult. Experimental results show the proposed models significantly
outperform the state-of-the-art in two data mining tasks: cold-start user
response time prediction and stock volatility prediction.Comment: 9 pages, to appear in SDM'1
A Non-Parametric Learning Approach to Identify Online Human Trafficking
Human trafficking is among the most challenging law enforcement problems
which demands persistent fight against from all over the globe. In this study,
we leverage readily available data from the website "Backpage"-- used for
classified advertisement-- to discern potential patterns of human trafficking
activities which manifest online and identify most likely trafficking related
advertisements. Due to the lack of ground truth, we rely on two human analysts
--one human trafficking victim survivor and one from law enforcement, for
hand-labeling the small portion of the crawled data. We then present a
semi-supervised learning approach that is trained on the available labeled and
unlabeled data and evaluated on unseen data with further verification of
experts.Comment: Accepted in IEEE Intelligence and Security Informatics 2016
Conference (ISI 2016